2022-06-04

Introduction

Leveraging textual data

Textual data comes into various sources

  • Blogs, forums, reviews
  • News websites, digitalized newspapers
  • Public statements, reports, interviews, transcripts

=> Let’s see an example using textual data to monitor the state of the economy

Introduction

Text-based index: Economic Policy Uncertainty

Introduction

ECB’s main refinancing rate and inflation

Question: Does textual data improve the assessment of how likely is it for a rate increase in the next months? What about non-standard monetary policies?

Introduction

Today’s focus: Topic-sentiment time series

  • Texts are constructed as sequence of ideas, often organized in paragraphs
  • Ideas can be presented positively or negatively

=> Texts contain multiples themes that display different position/sentiment

How to extract this information? How to recover the author’s sentiment on a specific idea?

We need a tools that leverages on both:

  • Sentiment analysis
  • Topic modeling

=> Today’s focus is on the creation of topic-dependent time series

Introduction

Today’s data

It is relatively easy to retrieve the full text of ECB press conferences since the creation of the ECB. There is about 260 press conferences between 1998 and 2022.

Related literature

Dictionary-based sentiment analysis

Related literature

Press conferences convey sentiment

Related literature

Topic modeling

Related literature

Topic modeling

Related literature

Topic modeling on ECB press conferences

Related literature

Topic modeling on ECB press conferences

Related literature

Topic proportions over time

Topic-sentiment methodology

Topic-sentiment methodology

Joining sentiment and topical information

I focus on a method incorporating the independent results of sentiment analysis and topic modeling.

  • Sentiment analysis estimates the sentiment conveyed by a text (scalar value)
  • Basic topic modeling estimates the topic proportions in a text (vector of scalars summing to 1)

Topic-sentiment methodology

Quantity of interest

The quantity of interest is the topical sentiment attention \(c_{k,t}\) at any point in time. This represents the sentiment effectively conveyed by topic \(k\) in period \(t\).

The quantity \(c_{k,t}\) is distinct from the sentiment \(s_t\) measured at the same period, in the sense that it only accounts for the expressed sentiment relevant to topic \(k\).

My hope is that the quantity \(c_{k,t}\) proves to be more useful for economic applications than the sentiment \(s_t\).

Topic-sentiment methodology

Computation

At the document level (document \(i\) of period \(t\)):

  • Sentiment analysis estimates a conveyed sentiment \(s_{t,i}\)
  • Topic modeling estimates the topical proportions \(\theta_{t,i,k}\) (\(k \in 1, \dots, K\))

Topic proportions can be used to select relevant documents. It is possible to use \(\theta_{t,i,k}\) directly in the time-series aggregation.

Topic-sentiment methodology

Two assumptions

We assume that the measured sentiment \(s_{t,i}\) comes from the the sentiment conveyed by each topic, defined by the product of the topic-specific sentiment and the topical proportion:

\[ \underbrace{s_{t,i}}_{\substack{\text{sentiment}\\\text{analysis}}} = \sum^K_{k=1} \overbrace{s_{t,i,k}}^{??} \times \underbrace{\theta_{t,i,k}}_{\substack{\text{topic}\\\text{modeling}}} \]

As we only know the estimated \(s_{t,i}\) from sentiment analysis and the topic proportions \(\theta_{t,i,k}\) from topic modeling independently, there is not a single solution for the values of \(s_{t,i,k}\).

To solve this problem, we assume that the topic-specific sentiment \(s_{t,i,k}\) is constant across topic within that text and equal to the text’s sentiment \(s_{t,i}\). Formally, \(s_{t,i,k} = s_{t,i}, \forall k \in K\).

Topic-sentiment methodology

Assumptions holds for short texts

Formally, \(s_{t,i,k} = s_{t,i}, \forall k \in K\).

This assumption becomes plausible (resp. implausible) when we have very short (resp. long) texts.

  • The shorter the text, the more likely that a single idea is represented.
  • On the other hand, more words in a text improves the accuracy of topic and sentiment detection.

On what dimension should topic modeling be applied?

  • Sentence
  • Paragraph
  • Article
  • Edition

Topic-sentiment methodology

Aggregation

In consequence, we compute the intermediate quantity \(c_{t,i,k} = s_{t,i,k} \times \theta_{t,i,k}\), that we define as topical sentiment attention. This quantity represents the effective sentiment conveyed by a topic in a given document.

This quantity might also be aggregated into time series for a sampling period \(t\) by taking the mean across documents:

\[ c_{t,k} = \frac{\sum_{i = 1}^{N_t} c_{t,i,k}}{N_t} \]

and it follows that \(s_t = \sum_{i=1}^K c_{t,k}\). In other words, this operation breaks the measured sentiment at a point in time into topic-specific sentiment quantities.

Application to ECB’s decisions

Application to ECB’s decisions

ECB’s main refinancing rate and inflation

Question: Does textual data improve the assessment of how likely is it for a rate increase in the next months? What about non-standard monetary policies?

Application to ECB’s decisions

Field-specific lexicon

The Picault & Renault (2017) lexicon has been specifically developed to analyze the sentiment conveyed by the ECB press conferences. It identifies n-grams, sequences of words, and classify them among 6 categories.

Applying the Picault & Renault lexicon gives two sentiment values for each paragraph:

  • MP: the Monetary Policy stance, ranging from restrictive (+) to accomodative (-)
  • EC: the Economic Condition outlook, ranging from positive (+) to negative (-)
##          date                                          paragraph     MP     EC
##        <char>                                             <char> <char> <char>
## 1: 1998-06-09 (ii) it provisionally agreed on a budget for the E -1.695    0.8
## 2: 1998-06-09 (iii) it agreed on the framework for the organisat      0      0
## 3: 1998-06-09 Furthermore, the Governing Council decided on two  -2.633      2
## 4: 1998-06-09 (ii) with respect to the euro banknotes, a largely  -3.39    1.6

Application to ECB’s decisions

Time series of sentiment using field-specific lexicon

Time series are formed by averaging the measured sentiment across paragraphs, yielding MP (Monetary Policy) and EC (Economic Condition) sentiment measures for each press conference date.

Application to ECB’s decisions

Topic model on ECB data

Application to ECB’s decisions

Topic-sentiment time series

At the document level (document \(i\) of period \(t\)):

  • Sentiment analysis estimates a conveyed sentiment \(s_{t,i}\)
  • Topic modeling estimates the topical proportions \(\theta_{t,i,k}\) (\(k \in 1, \dots, K\))

Topic proportions can be used to select relevant documents. It is possible to use \(\theta_{t,i,k}\) directly in the time-series aggregation.

Application to ECB’s decisions

Topical sentiment series decomposition

Black line is the EC sentiment. The filled area is the contribution of each topic to the EC sentiment.

Application to ECB’s decisions

Forecasting ECB’s decisions

Defining ECB’s decision :

  • +1 if raise in the interest rate or lower the monthly target of the asset purchase program
  • -1 if lower in the interest rate or raise the monthly target of the asset purchase program

Application to ECB’s decisions

Forecasting ECB’s decisions

How to model ECB’s decision? Following Picault & Renault (2017), using a forward looking Taylor monetary policy rule augmented with sentiment. Since \(\text{Decision}_t\) is a discrete outcome, we use an ordinal probit model with the latent variable defined as:

\[ \text{Decision}_t^* = \alpha + \beta_1 s^{\text{EC}}_t + \beta_2 s^{\text{MP}}_t + X_t^{T}\beta + \epsilon_t, \]

with \(X_t\) a set of macroeconomic variables:

  • Eurozone HICP - 2%
  • Euro area industrial production (excluding construction)
  • 12 month-ahead inflation expectation forecast from ECB’s quarterly survey
  • European Commission economic sentiment indicator

To test the significance of the topic-specific sentiment, we compare two models:

  • one using EC and MP as the sentiment predictor
  • another adding the difference between the EC sentiment specific to the topic Monetary policy analysis and strategy and the simple EC sentiment

Application to ECB’s decisions

Sentiment vs Topic-specific sentiment

Ordered probit models on \(Decision_t\) (unsignificant predictors not shown):
\(Decision_t\)
Nowcasting
One-period ahead
Model 1 Model 2 Model 3 Model 4 Model 5 Model 6
12-month inflation expectation 2.870*** (0.671) 2.383*** (0.707) 2.480*** (0.723) 2.595*** (0.660) 2.092** (0.695) 2.237** (0.711)
EC Sentiment 1.873*** (0.431) 3.224*** (0.658) 1.806*** (0.432) 3.191*** (0.683)
MP Sentiment 0.301 (0.216) 0.313 (0.220) 0.157 (0.214) 0.168 (0.218)
(EC Sentiment | Monetary policy analysis and strategy Topic) - (EC Sentiment) 2.551** (0.900) 2.486** (0.918)
\(N\) 238 238 238 237 237 237
\(Pseudo-R^2\) 0.119 0.206 0.235 0.117 0.191 0.218

As in Picault & Renault (2017), the Economic Condition is highly significant. Unlike them, however, the Monetary Policy sentiment is not significant.

The addition of topical information to the sentiment is significant. This is an encouraging result for this methodology.

Conclusion

  • There is a clear value in analyzing the textual data published by the ECB.
  • Sentiment analysis and topic modeling can be used concurrently, providing an improvement over sentiment alone.

Limitations

  • Topic modeling is “in-sample”. Topical weights incorporate future information.
  • Predicted variable \(Decision_t\) is not well defined for recent periods:
    • The interest rate is locked to zero
    • It is difficult to identify when the decision was taken with non-standard monetary policy.

sentopics package

Thank you!